Object detection
Code
- https://github.com/microsoft/computervision-recipes/tree/master/scenarios/detection
- #CODE MMdetection
- OpenMMLab Detection Toolbox and Benchmark (pytorch)
- https://mmdetection.readthedocs.io/
- #CODE TensorFlow object detection API
- #CODE Detectron2
- Detectron2 is FAIR's next-generation platform for object detection and segmentation
- #CODE https://github.com/jinwchoi/awesome-action-recognition#object-detection
- #CODE [YOLO: Real-Time Object Detection](https://github.com/ultralytics/ultralytics
References
- #PAPER Multiple Object Recognition with Visual Attention (Ba 2015)
- #PAPER Is object localization for free? – Weakly Supervised Object Recognition with Convolutional Neural Networks (2015):
- #PAPER DeepLab - Weakly-and semi-supervised learning of a DCNN for semantic image segmentation (Papandreou 2015)
- #PAPER SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation (Badrinarayanan 2016)
- #PAPER SSD. Single Shot MultiBox Detector (Liu 2016)
- #PAPER YOLO. You Only Look Once: Unified, Real-Time Object Detection (Redmon 2016)
- YOLO or You Only Look Once is an object detection algorithm much different from the region based algorithms. In YOLO a single convolutional network predicts the bounding boxes and the class probabilities for these boxes.
- How YOLO works is that we take an image and split it into an SxS grid, within each of the grid we take m bounding boxes. For each of the bounding box, the network outputs a class probability and offset values for the bounding box. The bounding boxes having the class probability above a threshold value is selected and used to locate the object within the image.
- #PAPER RetinaNet. Focal Loss for Dense Object Detection (Lin 2018)
- #PAPER CornerNet: Detecting Objects as Paired Keypoints (Law 2018)
- #PAPER #REVIEW Deep learning for Generic Object Detection: A Survey (Liu 2019)
- #PAPER #REVIEW Recent Advances in Object Detection in the Age of Deep CNNs (Agarwal 2019)
- #PAPER ExtremeNet. Bottom-up Object Detection by Grouping Extreme and Center Points (Zhou 2019)
- #PAPER EfficientDet: Scalable and Efficient Object Detection (Tan 2020)
- #CODE https://github.com/xuannianz/EfficientDet
- Included in the TF object detection API
- #CODE YOLOX: Exceeding YOLO Series in 2021 (Ge 2021)
Region-based CNNs (R-CNNs):
- https://www.pyimagesearch.com/2020/07/06/region-proposal-object-detection-with-opencv-keras-and-tensorflow/
- #BOOK Region-based RCNNs
- #PAPER Regional CNN (R-CNN)
- The goal of R-CNN is to take in an image, and correctly identify where the main objects (via a bounding box) in the image.
- R-CNN creates these bounding boxes, or region proposals, using a process called Selective Search.
- Once the proposals are created, R-CNN warps the region to a standard square size and passes it through to a modified version of AlexNet (the winning submission to ImageNet 2012 that inspired R-CNN).
- On the final layer of the CNN, R-CNN adds a Support Vector Machine (SVM) that simply classifies whether this is an object, and if so what object.
- #PAPER Fast R-CNN
- RoI (Region of Interest) Pooling. At its core, RoIPool shares the forward pass of a CNN for an image across its subregions.
- The second insight of Fast R-CNN is to jointly train the CNN, classifier, and bounding box regressor in a single model.
- #PAPER Faster R-CNN
- The insight of Faster R-CNN was that region proposals depended on features of the image that were already calculated with the forward pass of the CNN (first step of classification).
- So why not reuse those same CNN results for region proposals instead of running a separate selective search algorithm?
- A single CNN is used to both carry out region proposals and classification. This way, only one CNN needs to be trained and we get region proposals almost for free. Faster R-CNN adds a Fully Convolutional Network on top of the features of the CNN creating what’s known as the Region Proposal Network.
- #PAPER Mask R-CNN (He 2018)
- Extending Faster R-CNN for Pixel Level Segmentation
- Mask R-CNN does this by adding a branch to Faster R-CNN that outputs a binary mask that says whether or not a given pixel is part of an object. The branch, as before, is just a Fully Convolutional Network on top of a CNN based feature map.
- But the Mask R-CNN authors had to make one small adjustment to make this pipeline work as expected: Realigning RoIPool to be More Accurate.
- https://engineering.matterport.com/splash-of-color-instance-segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46
- https://modelzoo.co/model/mask-r-cnn-keras
- https://machinelearningmastery.com/how-to-train-an-object-detection-model-with-keras/